PSG hybrid approach to automatic corpus annotation
نویسنده
چکیده
This paper describes and evaluates a hybrid non-probabilistic parsing method for the grammatical annotation of large corpora and the live analysis of teaching sentences, employing a layered scheme of lexiconand contextbased Constraint Grammars on the one hand, and Phrase Structure Grammars or syntactic bracketing algorithms on the other. The method has been fully implemented by the author for Danish and Portuguese, and to a certain degree, Spanish. Add-on-modules were also produced for existing English and French taggers. On running newspaper text, overall correctness rates (F-scores) for the two most mature systems approach 99% for word class (PoS) and 95-96% for syntactic function tags at the shallow CG-level. Though propagating CG-errors into structural errors, subsequent constituent tree analysis adds under 1% of new attachment errors on manually revised CG-input. All modules in combination, without revision, generate 50-75% structurally ”legal” syntactic trees.
منابع مشابه
Arborest – a VISL-Style Treebank Derived from an Estonian Constraint Grammar Corpus
Treebank creation is a very labor-consuming task, especially if the applications intended include machine learning, gold standard parser evaluation or teaching, since only a manually checked syntactically annotated corpus can provide optimal support for these purposes. There are, however, possibilities to make the annotation process (partly) automatic, saving (manual) annotation time and/or all...
متن کاملPurePos 2.0: a hybrid tool for morphological disambiguation
We present PurePos, an open-source HMM-based automatic morphological annotation tool. PurePos can perform tagging and lemmatization at the same time, it is very fast to train, with the possibility of easy integration of symbolic rulebased components into the annotation process that can be used to boost the accuracy of the tool. The hybrid approach implemented in PurePos is especially beneficial...
متن کاملFuzzy Neighbor Voting for Automatic Image Annotation
With quick development of digital images and the availability of imaging tools, massive amounts of images are created. Therefore, efficient management and suitable retrieval, especially by computers, is one of themost challenging fields in image processing. Automatic image annotation (AIA) or refers to attaching words, keywords or comments to an image or to a selected part of it. In this paper,...
متن کاملTags Re-ranking Using Multi-level Features in Automatic Image Annotation
Automatic image annotation is a process in which computer systems automatically assign the textual tags related with visual content to a query image. In most cases, inappropriate tags generated by the users as well as the images without any tags among the challenges available in this field have a negative effect on the query's result. In this paper, a new method is presented for automatic image...
متن کاملAutomatic Selection of HPSG-Parsed Sentences for Treebank Construction
This article presents an ensemble parse approach to detecting and selecting high-quality linguistic analyses output by a hand-crafted HPSG grammar of Spanish implemented in the LKB system. The approach uses full agreement (i.e., exact syntactic match) along with a MaxEnt parse selection model and a statistical dependency parser trained on the same data. The ultimate goal is to develop a hybrid ...
متن کامل